Packages for today:

library(tidyverse)
library(ggbeeswarm)
library(janitor)
library(glmmTMB)
library(car)
library(emmeans)
library(DHARMa)
library(DiagrammeR)#this one is not necessary

Goals for today:

  • Understand when to use Generalized Linear Mixed Models
  • Know how to fit Generalized Linear Mixed Models in glmmTMB, including random intercepts and random slopes
  • Interpret and predict from GLMMs
  • Understand different types of model predictions and Jensen's inequality

A Generalized Linear Mixed Model (GLMM) is a GLM with random effects. We have seen we need GLMs to model response variables corresponding to presence/absence or to count data. We have seen that we use random effects to account for structure (such as spatial or time structure) in the unexplained variation of a response variable. Now we put together GLM and random effects in the same model.

You should already be familiar with most features of GLMs and random effects. Today we will mostly insist on special, counter-intuitive, aspects of GLMMs.

Binary GLMM, varying effect

The presence of a rare bettle has been monitored across 18 locations where it is known, on 22 consecutive days, in order to establish a standardized monitoring protocol before looking for the beetle in locations where it has not been detected yet. We suspect the bettle is more likely to be detected at higher temperatures because of increased activity. We want to test the effect of temperature on the probability to detect the species and check whether temperature has a consistent effect across days and locations.

We can visualise the effect of temperatures overall:

beetles <- read_csv("beetle_detection.csv")
## Parsed with column specification:
## cols(
##   temperature = col_double(),
##   presence = col_double(),
##   site = col_character(),
##   day = col_double()
## )
str(beetles)
## tibble [528 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ temperature: num [1:528] 7.91 15.03 18.91 2.44 15.76 ...
##  $ presence   : num [1:528] 0 0 1 0 0 0 0 0 0 0 ...
##  $ site       : chr [1:528] "A" "B" "C" "D" ...
##  $ day        : num [1:528] 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   temperature = col_double(),
##   ..   presence = col_double(),
##   ..   site = col_character(),
##   ..   day = col_double()
##   .. )
beetles %>% mutate(site_fct=factor(site)) %>%summary()
##   temperature        presence          site                day       
##  Min.   :-2.601   Min.   :0.0000   Length:528         Min.   : 1.00  
##  1st Qu.:10.450   1st Qu.:0.0000   Class :character   1st Qu.: 6.00  
##  Median :13.621   Median :1.0000   Mode  :character   Median :11.00  
##  Mean   :13.705   Mean   :0.5833                      Mean   :11.59  
##  3rd Qu.:16.873   3rd Qu.:1.0000                      3rd Qu.:18.00  
##  Max.   :29.040   Max.   :1.0000                      Max.   :22.00  
##                                                                      
##     site_fct  
##  A      : 22  
##  B      : 22  
##  C      : 22  
##  D      : 22  
##  E      : 22  
##  F      : 22  
##  (Other):396
ggplot(beetles, aes(x=temperature, y = presence, color=site))  + geom_beeswarm(groupOnX = FALSE)

And the variation in mean detection across locations, and across days:

beetles %>% group_by(site) %>%
  summarise(prob = mean(presence), 
            SE=sd(presence)/sqrt(length(presence)-1), lowCI=prob-1.96*SE,
            upCI=prob+1.96*SE) %>%
  ggplot(aes(x=site, y=prob)) + geom_point() + geom_errorbar(aes(x=site, ymin=lowCI, ymax=upCI)) + 
  ylim(0,1)
## `summarise()` ungrouping output (override with `.groups` argument)

beetles %>% group_by(day) %>%
  summarise(prob = mean(presence), 
            SE=sd(presence)/sqrt(length(presence)-1), lowCI=prob-1.96*SE,
            upCI=prob+1.96*SE) %>%
  ggplot(aes(x=day, y=prob)) + geom_point() + geom_errorbar(aes(x=day, ymin=lowCI, ymax=upCI)) + 
  ylim(-0.2,1.2)
## `summarise()` ungrouping output (override with `.groups` argument)

It looks like there is more variation among days than among locations; but there could be some clustering in both days and locations; we should definitely account for both in our models. Notice how the approximated confidence intervals we drew for days are wrong: two of them span over 1, although 1 is the logical maximum value.

Most sites were monitored once a day; but sometimes sites were missed, or monitored twice on the same day. That may reflect problems with logistics.

tabyl(beetles, var1 = day, var2 = site)
##  day A B C D E F G H I J K L M N O P Q R S T U V W X
##    1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##    2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
##    3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1
##    4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##    5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1
##    6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1
##    7 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2
##    8 1 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##    9 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1
##   10 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
##   11 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##   12 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##   13 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
##   14 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1
##   15 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
##   16 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1
##   17 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1
##   18 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1
##   19 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 1 1
##   20 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1
##   21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1
##   22 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2

There are probably not enough measurments to estimate site-by-day interactions. We will not be able to model space-time interactions, although we must keep in mind that some interesting variation could exist there. We simply cannot know with those data.

The most basic model we could fit to answer our question is:

m_beetle_t <- glmmTMB(data = beetles, formula = presence ~ 1 + temperature, family = "binomial")
summary(m_beetle_t)
##  Family: binomial  ( logit )
## Formula:          presence ~ 1 + temperature
## Data: beetles
## 
##      AIC      BIC   logLik deviance df.resid 
##    663.2    671.7   -329.6    659.2      526 
## 
## 
## Conditional model:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.66420    0.29634  -5.616 1.96e-08 ***
## temperature  0.14900    0.02132   6.988 2.80e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(m_beetle_t, type=3)
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: presence
##              Chisq Df Pr(>Chisq)    
## (Intercept) 31.537  1  1.957e-08 ***
## temperature 48.826  1  2.797e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(DiagrammeR)
grViz("
digraph boxes_and_circles {
node [shape = square, color = red]
presence; temp

node [shape = circle, color = black]
p; y

node [shape = diamond, color=green]
intercept; 

y -> p [label = '  plogis']; 
p -> presence [label = '  Bernoulli', style=dashed];
intercept ->y; temp -> slope -> y ;
}")

Model diagnostics are about right:

testResiduals(m_beetle_t)

## $uniformity
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.029792, p-value = 0.7368
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## ratioObsSim = 0.99977, p-value = 0.992
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test
## 
## data:  simulationOutput
## outliers at both margin(s) = 0, simulations = 528, p-value = 0.02557
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.000000000 0.006962165
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                                      0
## $uniformity
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.029792, p-value = 0.7368
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## ratioObsSim = 0.99977, p-value = 0.992
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test
## 
## data:  simulationOutput
## outliers at both margin(s) = 0, simulations = 528, p-value = 0.02557
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.000000000 0.006962165
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                                      0

The only possible problem is the presence of outliers; maybe we would not worry much about that.

However, that model is not correct conceptually. We know data were collected on different days and different locations, and we saw apparent variation among days and locations. We must account for days and locations before we can trust any result.

We could do this using fixed effects:

m_beetle_t_s_d <- glmmTMB(data = beetles, formula = presence ~ 1 + temperature + site + as.factor(day),
                      family = "binomial")

summary(m_beetle_t_s_d)
##  Family: binomial  ( logit )
## Formula:          presence ~ 1 + temperature + site + as.factor(day)
## Data: beetles
## 
##      AIC      BIC   logLik deviance df.resid 
##    408.3    604.7   -158.1    316.3      482 
## 
## 
## Conditional model:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -5.638e+00  1.037e+00  -5.435 5.49e-08 ***
## temperature       3.069e-01  4.092e-02   7.500 6.36e-14 ***
## siteB            -3.049e-01  9.929e-01  -0.307 0.758765    
## siteC             9.431e-01  1.001e+00   0.942 0.345950    
## siteD             2.274e+00  1.153e+00   1.973 0.048522 *  
## siteE            -3.838e-01  9.206e-01  -0.417 0.676712    
## siteF             5.618e-01  9.012e-01   0.623 0.533046    
## siteG             9.941e-01  1.025e+00   0.970 0.332156    
## siteH             2.493e-01  9.566e-01   0.261 0.794351    
## siteI             6.353e-01  9.545e-01   0.666 0.505710    
## siteJ             8.960e-01  9.773e-01   0.917 0.359284    
## siteK             2.539e-01  1.008e+00   0.252 0.801120    
## siteL            -4.761e-01  1.001e+00  -0.476 0.634260    
## siteM            -7.680e-01  1.083e+00  -0.709 0.478397    
## siteN            -4.796e-01  9.602e-01  -0.499 0.617471    
## siteO            -5.048e-01  9.312e-01  -0.542 0.587773    
## siteP            -5.336e-01  9.452e-01  -0.565 0.572407    
## siteQ             2.550e-01  9.375e-01   0.272 0.785657    
## siteR             2.714e-01  1.046e+00   0.260 0.795165    
## siteS             5.607e-01  9.983e-01   0.562 0.574368    
## siteT            -1.427e-01  9.543e-01  -0.150 0.881105    
## siteU             3.543e-01  9.505e-01   0.373 0.709314    
## siteV            -2.734e-01  9.928e-01  -0.275 0.783049    
## siteW            -3.250e-01  9.662e-01  -0.336 0.736606    
## siteX            -1.147e+00  1.002e+00  -1.144 0.252581    
## as.factor(day)2   1.142e+00  8.163e-01   1.399 0.161878    
## as.factor(day)3   2.208e+00  8.249e-01   2.676 0.007444 ** 
## as.factor(day)4   5.155e+00  1.229e+00   4.193 2.75e-05 ***
## as.factor(day)5   3.167e+00  8.058e-01   3.931 8.46e-05 ***
## as.factor(day)6   9.492e-01  8.109e-01   1.171 0.241788    
## as.factor(day)7   4.322e+00  9.847e-01   4.389 1.14e-05 ***
## as.factor(day)8   2.608e+00  7.907e-01   3.298 0.000974 ***
## as.factor(day)9   3.198e-01  7.707e-01   0.415 0.678169    
## as.factor(day)10  2.800e+01  5.503e+04   0.001 0.999594    
## as.factor(day)11  2.682e+01  3.793e+04   0.001 0.999436    
## as.factor(day)12  8.472e-01  8.288e-01   1.022 0.306727    
## as.factor(day)13  2.631e+01  3.679e+04   0.001 0.999430    
## as.factor(day)14 -7.080e-01  9.019e-01  -0.785 0.432420    
## as.factor(day)15  1.332e+00  9.314e-01   1.430 0.152815    
## as.factor(day)16  2.596e+01  1.699e+04   0.002 0.998781    
## as.factor(day)17  2.672e+00  8.649e-01   3.090 0.002004 ** 
## as.factor(day)18  1.913e+00  7.585e-01   2.523 0.011648 *  
## as.factor(day)19 -2.248e+01  1.114e+04  -0.002 0.998390    
## as.factor(day)20  3.264e-01  8.212e-01   0.398 0.690991    
## as.factor(day)21 -2.968e+00  1.127e+00  -2.633 0.008466 ** 
## as.factor(day)22  2.261e+00  7.585e-01   2.981 0.002875 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(m_beetle_t_s_d, type=3)
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: presence
##                 Chisq Df Pr(>Chisq)    
## (Intercept)    29.536  1  5.490e-08 ***
## temperature    56.257  1  6.359e-14 ***
## site           18.678 23     0.7198    
## as.factor(day) 79.573 21  9.518e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

That is okay, but not efficient in several ways:

  • The summary is very long and full of parameter estimates we do not really care about
  • The effect of each day and the effect of each location is estimated independently of the others days and locations, respectively; but it seems reasonable to assume that we could learn what is a typical day effect or site effect from the data. We could then use that information to refine our estimates of day and site effects.
  • Check the parameter estimates and standard errors for days 10, 11, 13 and 19: they are arbitrarily large (in absolute value); That is because we observed only 1s or only 0s for those days. There is an infinity of parameter values consistent with those data. Based on the previous point we could refine estimates by learning from other levels.

Demonstration: whether a logit-scale prediction is 28 or 100000 you get the same data:

rbinom(n = 1000, size = 1, prob = plogis(28))
##    [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##   [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##   [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [260] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [297] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [334] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [371] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [408] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [445] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [482] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [519] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [556] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [593] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [630] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [667] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [704] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [741] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [778] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [815] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [852] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [889] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [926] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [963] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1000] 1
rbinom(n = 1000, size = 1, prob = plogis(100000))
##    [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##   [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##   [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [260] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [297] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [334] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [371] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [408] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [445] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [482] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [519] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [556] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [593] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [630] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [667] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [704] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [741] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [778] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [815] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [852] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [889] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [926] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [963] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1000] 1
  • Finally, because we estimate parameters independently of each others, if we try to check whether the effect of temperatures is consistent among days we get a convergence problem:
m_beetle_tXd_s <- glmmTMB(data = beetles, formula = presence ~ 1 + temperature*as.factor(day) + site,
                      family = "binomial")
## Warning in fitTMB(TMBStruc): Model convergence problem; non-positive-definite
## Hessian matrix. See vignette('troubleshooting')
summary(m_beetle_tXd_s)
##  Family: binomial  ( logit )
## Formula:          presence ~ 1 + temperature * as.factor(day) + site
## Data: beetles
## 
##      AIC      BIC   logLik deviance df.resid 
##       NA       NA       NA       NA      461 
## 
## 
## Conditional model:
##                                Estimate Std. Error z value Pr(>|z|)
## (Intercept)                  -8.649e+00         NA      NA       NA
## temperature                   5.479e-01         NA      NA       NA
## as.factor(day)2               3.624e+00         NA      NA       NA
## as.factor(day)3               6.479e+00         NA      NA       NA
## as.factor(day)4              -2.896e+03         NA      NA       NA
## as.factor(day)5               3.269e+00         NA      NA       NA
## as.factor(day)6               1.030e+00         NA      NA       NA
## as.factor(day)7               1.146e+01         NA      NA       NA
## as.factor(day)8              -1.407e+00         NA      NA       NA
## as.factor(day)9               1.869e+00         NA      NA       NA
## as.factor(day)10              2.349e+03         NA      NA       NA
## as.factor(day)11              4.896e+02         NA      NA       NA
## as.factor(day)12              2.422e+00         NA      NA       NA
## as.factor(day)13              3.396e+02         NA      NA       NA
## as.factor(day)14              7.758e-01         NA      NA       NA
## as.factor(day)15              2.121e+00         NA      NA       NA
## as.factor(day)16              1.576e+03         NA      NA       NA
## as.factor(day)17              8.591e+00         NA      NA       NA
## as.factor(day)18              6.852e+00         NA      NA       NA
## as.factor(day)19             -1.422e+01         NA      NA       NA
## as.factor(day)20              4.613e+00         NA      NA       NA
## as.factor(day)21              7.594e-01         NA      NA       NA
## as.factor(day)22              7.275e+00         NA      NA       NA
## siteB                        -6.374e-01         NA      NA       NA
## siteC                         2.842e-01         NA      NA       NA
## siteD                         2.065e+00         NA      NA       NA
## siteE                        -7.897e-01         NA      NA       NA
## siteF                         2.460e-01         NA      NA       NA
## siteG                         6.209e-01         NA      NA       NA
## siteH                        -1.252e-01         NA      NA       NA
## siteI                         5.858e-01         NA      NA       NA
## siteJ                         5.071e-01         NA      NA       NA
## siteK                        -1.377e-01         NA      NA       NA
## siteL                        -1.063e+00         NA      NA       NA
## siteM                        -1.153e+00         NA      NA       NA
## siteN                        -1.096e+00         NA      NA       NA
## siteO                        -1.134e+00         NA      NA       NA
## siteP                        -9.464e-01         NA      NA       NA
## siteQ                        -4.196e-02         NA      NA       NA
## siteR                        -1.324e-01         NA      NA       NA
## siteS                         2.070e-01         NA      NA       NA
## siteT                        -5.242e-01         NA      NA       NA
## siteU                         4.429e-02         NA      NA       NA
## siteV                        -1.028e+00         NA      NA       NA
## siteW                        -5.191e-01         NA      NA       NA
## siteX                        -1.499e+00         NA      NA       NA
## temperature:as.factor(day)2  -1.607e-01         NA      NA       NA
## temperature:as.factor(day)3  -3.169e-01         NA      NA       NA
## temperature:as.factor(day)4   4.107e+02         NA      NA       NA
## temperature:as.factor(day)5   4.162e-02         NA      NA       NA
## temperature:as.factor(day)6   1.288e-03         NA      NA       NA
## temperature:as.factor(day)7  -5.244e-01         NA      NA       NA
## temperature:as.factor(day)8   4.014e-01         NA      NA       NA
## temperature:as.factor(day)9  -1.255e-01         NA      NA       NA
## temperature:as.factor(day)10  4.917e+01         NA      NA       NA
## temperature:as.factor(day)11  4.128e+02         NA      NA       NA
## temperature:as.factor(day)12 -1.177e-01         NA      NA       NA
## temperature:as.factor(day)13  1.952e+02         NA      NA       NA
## temperature:as.factor(day)14 -1.234e-01         NA      NA       NA
## temperature:as.factor(day)15 -4.775e-02         NA      NA       NA
## temperature:as.factor(day)16  2.476e+02         NA      NA       NA
## temperature:as.factor(day)17 -4.601e-01         NA      NA       NA
## temperature:as.factor(day)18 -3.623e-01         NA      NA       NA
## temperature:as.factor(day)19 -2.151e+00         NA      NA       NA
## temperature:as.factor(day)20 -3.002e-01         NA      NA       NA
## temperature:as.factor(day)21 -2.596e-01         NA      NA       NA
## temperature:as.factor(day)22 -3.649e-01         NA      NA       NA

So, it is often more convenient and efficient to use random effects for categorical variables for which we do not care to much about the exact parameter estimates.

To fit a random effect in lme4 or glmmTMB you can write + (1 | group) for a random intercept of the variable group.

m_beetle_t_s_d  <- glmmTMB(data = beetles,
                      formula = presence ~ 1 + temperature + (1|day) + (1|site), family = "binomial")
summary(m_beetle_t_s_d)
##  Family: binomial  ( logit )
## Formula:          presence ~ 1 + temperature + (1 | day) + (1 | site)
## Data: beetles
## 
##      AIC      BIC   logLik deviance df.resid 
##    433.4    450.5   -212.7    425.4      524 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance  Std.Dev. 
##  day    (Intercept) 8.470e+00 2.910e+00
##  site   (Intercept) 4.127e-09 6.424e-05
## Number of obs: 528, groups:  day, 22; site, 24
## 
## Conditional model:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.88933    0.79076  -3.654 0.000258 ***
## temperature  0.27667    0.03523   7.854 4.04e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

That is much cleaner looking, but also more efficient computationally.

library(DiagrammeR)
grViz("
digraph boxes_and_circles {
node [shape = square, color = red]
presence; temp

node [shape = circle, color = black]
p; y; baseline; noise

node [shape = diamond, color=green]
intercept; 

y -> p [label = '  plogis']; 
p -> presence [label = '  Bernoulli', style=dashed];
intercept -> baseline ; noise -> baseline;
baseline -> y;
temp -> slope -> y ; 
V_site -> noise [label = '  Gaussian', style=dashed];
V_day -> noise [label = '  Gaussian', style=dashed];

}")

One small annoyance: The Anova() function tests only fixed effects.

Anova(m_beetle_t_s_d, type = 3)
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: presence
##              Chisq Df Pr(>Chisq)    
## (Intercept) 13.351  1  0.0002583 ***
## temperature 61.682  1  4.036e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

To test random effects you need to compare nested models with/without a given random effect using anova() (lower case!).

m_beetle_t_s <- glmmTMB(data = beetles,
                      formula = presence ~ 1 + temperature+ (1|site), family = "binomial")
m_beetle_t_d <- glmmTMB(data = beetles,
                      formula = presence ~ 1 + temperature+ (1|day), family = "binomial")

Test for the random intercept of site:

anova(m_beetle_t_d, m_beetle_t_s_d)
## Data: beetles
## Models:
## m_beetle_t_d: presence ~ 1 + temperature + (1 | day), zi=~0, disp=~1
## m_beetle_t_s_d: presence ~ 1 + temperature + (1 | day) + (1 | site), zi=~0, disp=~1
##                Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)
## m_beetle_t_d    3 431.45 444.25 -212.72   425.45                        
## m_beetle_t_s_d  4 433.45 450.52 -212.72   425.45     0      1          1

No evidence for variance in the intercept due to site differences. But it does not really hurt to keep the effect of site in the model.

Test for the random intercept of day:

anova(m_beetle_t_s, m_beetle_t_s_d)
## Data: beetles
## Models:
## m_beetle_t_s: presence ~ 1 + temperature + (1 | site), zi=~0, disp=~1
## m_beetle_t_s_d: presence ~ 1 + temperature + (1 | day) + (1 | site), zi=~0, disp=~1
##                Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
## m_beetle_t_s    3 665.20 678.01 -329.60   659.20                             
## m_beetle_t_s_d  4 433.45 450.52 -212.72   425.45 233.75      1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Strong evidence for variance in the intercept due to day differences.

Visualise results

Let's try to visualise this model:

emm_m_beetle_t_s_d <- as_tibble(emmeans(object = m_beetle_t_s_d, specs = ~ temperature,
        at= list(temperature=seq(min(beetles$temperature), max(beetles$temperature), length.out = 100)), 
        type="response"))
ggplot(beetles, aes(x=temperature, y=presence)) + geom_beeswarm(groupOnX = FALSE) +
  geom_line(data = emm_m_beetle_t_s_d, mapping = aes(y=prob), col="red") + 
  geom_ribbon(data = emm_m_beetle_t_s_d, mapping = aes(x=temperature, ymin = lower.CL, ymax= upper.CL),
              inherit.aes = FALSE, 
              fill="red", alpha=0.1) + facet_wrap(~day)

But what is it we are seeing? This is a prediction for a typical day; that is, when the random effect of day has an effect of zero.

Emmeans does not give fine control over what we are doing. Predict may be better here.

Let's consider predictions for two different days; day 19 where we did not detect the beetle at all, and day 10 where we detected the beetle at all sites.

newdata_day19 <- tibble(temperature = seq(min(beetles$temperature), max(beetles$temperature), length.out = 100), 
                  day=19, site="A")

newdata_day10 <- tibble(temperature = seq(min(beetles$temperature), max(beetles$temperature), length.out = 100), 
                  day=10, site="A")

We can set the random effects to zero, to obtain predictions for a median day:

predict(m_beetle_t_s_d, re.form = ~ 0, newdata = newdata_day19, type = "response")
##   [1] 0.02636552 0.02873309 0.03130644 0.03410216 0.03713797 0.04043273
##   [7] 0.04400642 0.04788022 0.05207644 0.05661855 0.06153111 0.06683972
##  [13] 0.07257090 0.07875204 0.08541116 0.09257678 0.10027765 0.10854248
##  [19] 0.11739961 0.12687662 0.13699991 0.14779420 0.15928200 0.17148305
##  [25] 0.18441370 0.19808627 0.21250840 0.22768244 0.24360480 0.26026541
##  [31] 0.27764723 0.29572581 0.31446908 0.33383720 0.35378260 0.37425017
##  [37] 0.39517775 0.41649666 0.43813254 0.46000630 0.48203522 0.50413420
##  [43] 0.52621703 0.54819778 0.56999208 0.59151843 0.61269939 0.63346262
##  [49] 0.65374180 0.67347733 0.69261690 0.71111580 0.72893711 0.74605163
##  [55] 0.76243773 0.77808102 0.79297391 0.80711511 0.82050902 0.83316511
##  [61] 0.84509728 0.85632321 0.86686376 0.87674233 0.88598432 0.89461664
##  [67] 0.90266719 0.91016449 0.91713730 0.92361433 0.92962392 0.93519388
##  [73] 0.94035129 0.94512235 0.94953228 0.95360522 0.95736421 0.96083116
##  [79] 0.96402678 0.96697065 0.96968118 0.97217568 0.97447034 0.97658033
##  [85] 0.97851976 0.98030183 0.98193879 0.98344200 0.98482204 0.98608869
##  [91] 0.98725100 0.98831734 0.98929546 0.99019251 0.99101506 0.99176920
##  [97] 0.99246052 0.99309418 0.99367493 0.99420712
predict(m_beetle_t_s_d, re.form = ~ 0, newdata = newdata_day10, type = "response")
##   [1] 0.02636552 0.02873309 0.03130644 0.03410216 0.03713797 0.04043273
##   [7] 0.04400642 0.04788022 0.05207644 0.05661855 0.06153111 0.06683972
##  [13] 0.07257090 0.07875204 0.08541116 0.09257678 0.10027765 0.10854248
##  [19] 0.11739961 0.12687662 0.13699991 0.14779420 0.15928200 0.17148305
##  [25] 0.18441370 0.19808627 0.21250840 0.22768244 0.24360480 0.26026541
##  [31] 0.27764723 0.29572581 0.31446908 0.33383720 0.35378260 0.37425017
##  [37] 0.39517775 0.41649666 0.43813254 0.46000630 0.48203522 0.50413420
##  [43] 0.52621703 0.54819778 0.56999208 0.59151843 0.61269939 0.63346262
##  [49] 0.65374180 0.67347733 0.69261690 0.71111580 0.72893711 0.74605163
##  [55] 0.76243773 0.77808102 0.79297391 0.80711511 0.82050902 0.83316511
##  [61] 0.84509728 0.85632321 0.86686376 0.87674233 0.88598432 0.89461664
##  [67] 0.90266719 0.91016449 0.91713730 0.92361433 0.92962392 0.93519388
##  [73] 0.94035129 0.94512235 0.94953228 0.95360522 0.95736421 0.96083116
##  [79] 0.96402678 0.96697065 0.96968118 0.97217568 0.97447034 0.97658033
##  [85] 0.97851976 0.98030183 0.98193879 0.98344200 0.98482204 0.98608869
##  [91] 0.98725100 0.98831734 0.98929546 0.99019251 0.99101506 0.99176920
##  [97] 0.99246052 0.99309418 0.99367493 0.99420712

Or we can look at how predictions differ on different days:

predict(m_beetle_t_s_d, re.form = NULL, newdata = newdata_day19, type = "response")
##   [1] 0.0001027289 0.0001122256 0.0001226001 0.0001339336 0.0001463146
##   [6] 0.0001598399 0.0001746153 0.0001907562 0.0002083888 0.0002276510
##  [11] 0.0002486932 0.0002716798 0.0002967904 0.0003242212 0.0003541864
##  [16] 0.0003869199 0.0004226774 0.0004617378 0.0005044062 0.0005510152
##  [21] 0.0006019285 0.0006575431 0.0007182924 0.0007846498 0.0008571322
##  [26] 0.0009363040 0.0010227812 0.0011172365 0.0012204043 0.0013330860
##  [31] 0.0014561567 0.0015905711 0.0017373715 0.0018976949 0.0020727821
##  [36] 0.0022639868 0.0024727855 0.0027007888 0.0029497530 0.0032215930
##  [41] 0.0035183966 0.0038424392 0.0041962002 0.0045823810 0.0050039239
##  [46] 0.0054640326 0.0059661943 0.0065142041 0.0071121897 0.0077646397
##  [51] 0.0084764324 0.0092528674 0.0100996989 0.0110231710 0.0120300550
##  [56] 0.0131276896 0.0143240215 0.0156276492 0.0170478679 0.0185947160
##  [61] 0.0202790226 0.0221124560 0.0241075719 0.0262778616 0.0286377987
##  [66] 0.0312028832 0.0339896825 0.0370158672 0.0403002405 0.0438627586
##  [71] 0.0477245404 0.0519078644 0.0564361484 0.0613339107 0.0666267076
##  [76] 0.0723410452 0.0785042601 0.0851443658 0.0922898605 0.0999694928
##  [81] 0.1082119808 0.1170456835 0.1264982208 0.1365960415 0.1473639414
##  [86] 0.1588245328 0.1709976706 0.1838998435 0.1975435397 0.2119366015
##  [91] 0.2270815856 0.2429751482 0.2596074777 0.2769617994 0.2950139763
##  [96] 0.3137322328 0.3330770229 0.3530010646 0.3734495554 0.3943605786
predict(m_beetle_t_s_d, re.form = NULL, newdata = newdata_day10, type = "response")
##   [1] 0.6272756 0.6477068 0.6676114 0.6869349 0.7056305 0.7236587 0.7409880
##   [8] 0.7575947 0.7734621 0.7885808 0.8029475 0.8165650 0.8294414 0.8415893
##  [15] 0.8530252 0.8637692 0.8738440 0.8832744 0.8920869 0.9003091 0.9079696
##  [22] 0.9150968 0.9217197 0.9278668 0.9335658 0.9388444 0.9437287 0.9482445
##  [29] 0.9524162 0.9562671 0.9598195 0.9630945 0.9661120 0.9688907 0.9714483
##  [36] 0.9738014 0.9759653 0.9779545 0.9797825 0.9814618 0.9830040 0.9844200
##  [43] 0.9857197 0.9869124 0.9880067 0.9890106 0.9899313 0.9907755 0.9915496
##  [50] 0.9922592 0.9929097 0.9935058 0.9940522 0.9945528 0.9950115 0.9954317
##  [57] 0.9958167 0.9961694 0.9964925 0.9967883 0.9970594 0.9973075 0.9975349
##  [64] 0.9977430 0.9979336 0.9981082 0.9982680 0.9984143 0.9985483 0.9986710
##  [71] 0.9987834 0.9988862 0.9989804 0.9990666 0.9991455 0.9992178 0.9992839
##  [78] 0.9993445 0.9993999 0.9994507 0.9994972 0.9995397 0.9995786 0.9996143
##  [85] 0.9996469 0.9996768 0.9997041 0.9997292 0.9997521 0.9997731 0.9997923
##  [92] 0.9998098 0.9998259 0.9998407 0.9998541 0.9998665 0.9998778 0.9998881
##  [99] 0.9998976 0.9999063

Let's generalize that and make predictions for each day at the same time:

newdata_days1_10_19 <- expand_grid(temperature = seq(min(beetles$temperature), max(beetles$temperature), length.out = 100), 
                  day=1:22, site="A")
newdata_days1_10_19$prob <- predict(m_beetle_t_s_d, re.form = NULL, 
                                    newdata = newdata_days1_10_19, type = "response")
ggplot(beetles, aes(x=temperature, y=presence)) + geom_beeswarm(groupOnX = FALSE) +
  geom_line(data = emm_m_beetle_t_s_d, mapping = aes(y=prob), col="red") + 
  geom_ribbon(data = emm_m_beetle_t_s_d, mapping = aes(x=temperature, ymin = lower.CL, ymax= upper.CL),
              inherit.aes = FALSE, 
              fill="red", alpha=0.1) + 
  geom_line(newdata_days1_10_19, mapping = aes(temperature, y=prob, col=as.factor(day)))+facet_wrap(~day)

If we want confidence intervals on those lines we need to calculate confidence intervals on the link scale and then apply the back-transformation:

newdata_days1_10_19 <- expand_grid(temperature = seq(min(beetles$temperature), max(beetles$temperature), length.out = 100), 
                  day=1:22, site="A")

newdata_days1_10_19 <- bind_cols(newdata_days1_10_19, as_tibble(predict(m_beetle_t_s_d, re.form = NULL, 
                                    newdata = newdata_days1_10_19, type = "link", se.fit = TRUE) ) )

newdata_days1_10_19 <- newdata_days1_10_19 %>% mutate(prob = plogis(fit), lowCI = plogis(fit - 1.96*se.fit ), upCI = plogis(fit+1.96*se.fit))
ggplot(beetles, aes(x=temperature, y=presence)) + geom_beeswarm(groupOnX = FALSE) +
  geom_line(data = emm_m_beetle_t_s_d, mapping = aes(y=prob)) + 
  geom_ribbon(data = emm_m_beetle_t_s_d, mapping = aes(x=temperature, ymin = lower.CL, ymax= upper.CL),
              inherit.aes = FALSE, 
              alpha=0.1) + 
  geom_line(newdata_days1_10_19, mapping = aes(x=temperature, y=prob, col=as.factor(day))) +
  geom_ribbon(newdata_days1_10_19, mapping = aes(x=temperature, ymin=lowCI, ymax=upCI, fill=as.factor(day)), inherit.aes = FALSE, alpha=0.1) + facet_wrap(~day)

In black we see the median day prediction (no random effect), in color, predictions for 3 different days.

We can see quite contrasted predictions for different days. On day 10 and 19 the model makes quite uncertain predictions; that is expected, because we observed only 1s, or 0s, respectively. However, the random effect extracted enough information from other days to predict that the probability of presence should not be exactly 1 at low temperature in day 10, and not exactly 0 at high temperatures in day 19.

One last thing we may want to see, is the average response, across all days.

newdata_alldays <- expand_grid(temperature = seq(min(beetles$temperature), max(beetles$temperature), length.out = 100), 
                  day= unique(beetles$day), site="A")

newdata_alldays <- bind_cols(newdata_alldays, as_tibble(predict(m_beetle_t_s_d, re.form = NULL, 
                                newdata = newdata_alldays, type = "link", se.fit = TRUE)))
meanpred <- newdata_alldays %>% group_by(temperature) %>%
  summarise(mean_prob = mean(plogis(fit)), 
            lowCI = mean(plogis(fit-1.96*se.fit)), upCI = mean(plogis(fit + 1.96*se.fit)))
## `summarise()` ungrouping output (override with `.groups` argument)
ggplot(beetles, aes(x=temperature, y=presence)) + geom_beeswarm(groupOnX = FALSE) +
  geom_line(data = emm_m_beetle_t_s_d, mapping = aes(y=prob)) + 
  geom_ribbon(data = emm_m_beetle_t_s_d, mapping = aes(x=temperature, ymin = lower.CL, ymax= upper.CL),
              inherit.aes = FALSE, 
              alpha=0.1) + 
  geom_line(newdata_days1_10_19, mapping = aes(x=temperature, y=prob, col=as.factor(day))) +
  geom_ribbon(newdata_days1_10_19, mapping = aes(x=temperature, ymin=lowCI, ymax=upCI, fill=as.factor(day)), inherit.aes = FALSE, alpha=0.1) + 
  geom_line(meanpred, mapping = aes(x=temperature, y=mean_prob), col = "goldenrod") +
    geom_ribbon(meanpred, mapping = aes(x=temperature, ymin=lowCI, ymax=upCI), inherit.aes = FALSE, fill="goldenrod", alpha=0.3) + facet_wrap(~day)

In addition to the previous lines, we can see in golden the prediction for the average presence across all days.

Finally, we could add a prediction line from a naive model that did not include random effects:

ggplot(beetles, aes(x=temperature, y=presence)) + geom_beeswarm(groupOnX = FALSE) +
  geom_line(data = emm_m_beetle_t_s_d, mapping = aes(y=prob)) + 
  geom_ribbon(data = emm_m_beetle_t_s_d, mapping = aes(x=temperature, ymin = lower.CL, ymax= upper.CL),
              inherit.aes = FALSE, 
              alpha=0.1) + 
    geom_line(meanpred, mapping = aes(x=temperature, y=mean_prob), col = "goldenrod", size=3) +
    geom_ribbon(meanpred, mapping = aes(x=temperature, ymin=lowCI, ymax=upCI), inherit.aes = FALSE, fill="goldenrod", alpha=0.3) +
  geom_smooth(method="glm", method.args=list(family="binomial"))
## `geom_smooth()` using formula 'y ~ x'

Here, the naive prediction is quite similar to the mean prediction from the mixed effect model, because the dataset is quite well balanced, but the confidence interval is very different. The difference could be more dramatic in a non-balanced dataset.

If we facet wrap by day, we get a model for each day. You will see that the model fitted on each days are much worse than the global model using all the information at once. In particular, on some days there is not enough information to fit a glm() when taken alone, but prediction of the model using all data appear reasonable.

ggplot(beetles, aes(x=temperature, y=presence)) + geom_beeswarm(groupOnX = FALSE) +
  geom_line(data = emm_m_beetle_t_s_d, mapping = aes(y=prob)) + 
  geom_ribbon(data = emm_m_beetle_t_s_d, mapping = aes(x=temperature, ymin = lower.CL, ymax= upper.CL),
              inherit.aes = FALSE, 
              alpha=0.1) + 
  geom_line(newdata_days1_10_19, mapping = aes(x=temperature, y=prob, col=as.factor(day))) +
  geom_ribbon(newdata_days1_10_19, mapping = aes(x=temperature, ymin=lowCI, ymax=upCI, fill=as.factor(day)), inherit.aes = FALSE, alpha=0.1) + 
    geom_line(meanpred, mapping = aes(x=temperature, y=mean_prob), col = "goldenrod") +
    geom_ribbon(meanpred, mapping = aes(x=temperature, ymin=lowCI, ymax=upCI), inherit.aes = FALSE, fill="goldenrod", alpha=0.3) +
  geom_smooth(method="glm", method.args=list(family="binomial")) + facet_wrap(~day)
## `geom_smooth()` using formula 'y ~ x'
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Average day, or average across days... Jensen's inequality

OK, now you should start to see why GLMMs can be confusing and complicated. The difference between these lines can seem quite subtle, but they have different meanings, and in some cases will tell very different stories.

In a linear mixed model there is no difference between median prediction and mean prediction because linear models assume effects are additive.

In GLMs effects are additive only on the scale of the "link scale", but not on the scale of the data.

The distortion due to the change in scale can be simulated. First, let's draw random numbers following a normal distribution of mean 1.

x1 <- rnorm(n = 10000, mean = 2, sd = 2)

What is their mean?

mean(x1)
## [1] 1.984147

(basically 2)

Now, let's apply a plogis() transform to these numbers; as if we were back-transforming predictions from a binary GLM:

px1 <- plogis(x1)

You could expect the mean of px1, to be plogis(2) or plogis(mean(x1)), but...

plogis(mean(x1))
## [1] 0.8791225
mean(px1)
## [1] 0.7736642

Even more dramatic, we can apply an exp() transform, as if we were back-transforming predictions from a Poisson GLM:

lx1 <- exp(x1)
mean(lx1)
## [1] 56.36405
exp(mean(x1))
## [1] 7.272838

The difference between the mean of the transformed values and the transformed value of the mean is called "Jensen's inequality"; and is the basic reason why you need to be careful when making predictions with a GLMM. Imagine you have a GLMM with a random effect of year; What are you trying to do?

  • If you want to calculate something about a typical year, exclude random effects from the calculation (that is what emmeans does)
  • If you want to calculate something about specific years, add specific year levels in your calculations
  • If you want to calculate something on average over all years, you need to integrate (= average) the calculation over years.

Random slope

We have answered the first part of our question: temperature increases the probability of beetle detection. But is temperature a reliable indicator? Do different days, or different sites have different responses to temperatures?

We already saw that fitting a fixed effect interaction between temperature and day did not work here; the model did not converge. What we need to do, is to fit random effects for the slope of temperature.

You know the 1 + we write at the beginnin of formula? It stands of "intercept", and that is why we write random intercept models as (1 | group). So, if we want a varying effect of temperature in addition to the random effect on the intercept, we can write (1 + temperature | group):

Instead, we can use a random slope model:

m_beetle_tXd_s <- glmmTMB(data = beetles,
                      formula = presence ~ 1 + temperature + (1 + temperature | day) +
                        (1|site), family = "binomial")
summary(m_beetle_tXd_s)
##  Family: binomial  ( logit )
## Formula:          
## presence ~ 1 + temperature + (1 + temperature | day) + (1 | site)
## Data: beetles
## 
##      AIC      BIC   logLik deviance df.resid 
##    436.3    461.9   -212.1    424.3      522 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance  Std.Dev.  Corr  
##  day    (Intercept) 1.355e+01 3.682e+00       
##         temperature 5.261e-03 7.253e-02 -0.83 
##  site   (Intercept) 5.797e-09 7.614e-05       
## Number of obs: 528, groups:  day, 22; site, 24
## 
## Conditional model:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.88018    0.95394  -3.019  0.00253 ** 
## temperature  0.26860    0.04191   6.408 1.47e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(m_beetle_tXd_s)
## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: presence
##              Chisq Df Pr(>Chisq)    
## temperature 41.068  1   1.47e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(DiagrammeR)
grViz("
digraph boxes_and_circles {
node [shape = square, color = red]
presence; temp

node [shape = circle, color = black]
p; y; noise1; noise2; slopes; baselines;

node [shape = diamond, color=green]
intercept; 

y -> p [label = '  plogis']; 
p -> presence [label = '  Bernoulli', style=dashed];
intercept -> baselines ; 
noise1 -> baselines;
baselines -> y;
slope -> slopes;
noise2 -> slopes;
temp -> slopes -> y ;
V_site -> noise1 [label = '  Gaussian', style=dashed];
V_day -> noise2 [label = '  Gaussian', style=dashed];
V_day -> noise1 [label = '  Gaussian', style=dashed];

}")

The Anova() returns only a test for the fixed effect of temperature. How do we test whether the effect of temperature varied among days? We can compare a model with the varying slope to a model without it, using the function anova() (lower case!).

m_beetle_t_d_s <- glmmTMB(data = beetles,
                      formula = presence ~ 1 + temperature + (1 | day) + (1|site), 
                      family = "binomial")

anova(m_beetle_t_d_s, m_beetle_tXd_s)
## Data: beetles
## Models:
## m_beetle_t_d_s: presence ~ 1 + temperature + (1 | day) + (1 | site), zi=~0, disp=~1
## m_beetle_tXd_s: presence ~ 1 + temperature + (1 + temperature | day) + (1 | site), zi=~0, disp=~1
##                Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## m_beetle_t_d_s  4 433.45 450.52 -212.72   425.45                         
## m_beetle_tXd_s  6 436.30 461.91 -212.15   424.30 1.1483      2     0.5632

So, there is not strong evidence for a varying effect of temperature among days.

What about variation in the effect of temperature among sites?

m_beetle_tXs_d <- glmmTMB(data = beetles,
                      formula = presence ~ 1 + temperature + (1 | day) +
                        (1+ temperature |site), family = "binomial")
## Warning in fitTMB(TMBStruc): Model convergence problem; non-positive-definite
## Hessian matrix. See vignette('troubleshooting')
summary(m_beetle_tXs_d)
##  Family: binomial  ( logit )
## Formula:          
## presence ~ 1 + temperature + (1 | day) + (1 + temperature | site)
## Data: beetles
## 
##      AIC      BIC   logLik deviance df.resid 
##       NA       NA       NA       NA      522 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance  Std.Dev.  Corr 
##  day    (Intercept) 8.470e+00 2.910e+00      
##  site   (Intercept) 3.832e-10 1.957e-05      
##         temperature 1.251e-11 3.538e-06 0.48 
## Number of obs: 528, groups:  day, 22; site, 24
## 
## Conditional model:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.88933    0.79076  -3.654 0.000258 ***
## temperature  0.27667    0.03523   7.854 4.04e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model has difficulties converging so we cannot compare it safely to another model. However, the reason why the model is not converging is our anwser: the variance parameters are effectively 0, so there is no evidence for a varying effect of temperature among sites.

Oh, and by the way, we should check the fit of our models. I have been ignoring that aspect a little bit to focus on other content; but in real work we should do it:

testResiduals(m_beetle_tXd_s)

## $uniformity
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.039023, p-value = 0.3973
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## ratioObsSim = 0.99341, p-value = 0.72
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test
## 
## data:  simulationOutput
## outliers at both margin(s) = 0, simulations = 528, p-value = 0.02557
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.000000000 0.006962165
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                                      0
## $uniformity
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.039023, p-value = 0.3973
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## ratioObsSim = 0.99341, p-value = 0.72
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test
## 
## data:  simulationOutput
## outliers at both margin(s) = 0, simulations = 528, p-value = 0.02557
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.000000000 0.006962165
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                                      0

Again, some evidence of outliers, but nothing looks too scary.

Random effect for over-dispersion in count data GLMMs

We study sexual selection in mosquito fish. We want to test whether the size of male gonopodium (it's a ventral fin they use in display and mating) predicts reproductive success. We have studied fish in 14 different tanks and we want to account for possible variation among tanks, so we will need a random effect for tank. The response variable, reproductive success is a count variable, so we will probably use a count data GLM. GLM + random effect = GLMM.

fish <- read_csv("fishrepro.csv")
## Parsed with column specification:
## cols(
##   id = col_double(),
##   reproductive_success = col_double(),
##   gonopodium_size = col_double(),
##   tank_id = col_double()
## )
summary(fish)
##        id         reproductive_success gonopodium_size     tank_id      
##  Min.   :   1.0   Min.   : 0.0000      Min.   : 7.190   Min.   : 1.000  
##  1st Qu.: 355.8   1st Qu.: 0.0000      1st Qu.: 9.349   1st Qu.: 4.000  
##  Median : 710.5   Median : 0.0000      Median :10.015   Median : 7.000  
##  Mean   : 710.5   Mean   : 0.7753      Mean   :10.016   Mean   : 7.387  
##  3rd Qu.:1065.2   3rd Qu.: 1.0000      3rd Qu.:10.652   3rd Qu.:11.000  
##  Max.   :1420.0   Max.   :45.0000      Max.   :13.390   Max.   :14.000
ggplot(fish, aes(x=reproductive_success))+geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(fish, aes(x=gonopodium_size, y=reproductive_success)) + geom_point()

tibble(tabyl(dat = fish, var1 = id))
## # A tibble: 1,420 x 3
##       id     n  percent
##    <dbl> <dbl>    <dbl>
##  1     1     1 0.000704
##  2     2     1 0.000704
##  3     3     1 0.000704
##  4     4     1 0.000704
##  5     5     1 0.000704
##  6     6     1 0.000704
##  7     7     1 0.000704
##  8     8     1 0.000704
##  9     9     1 0.000704
## 10    10     1 0.000704
## # … with 1,410 more rows

The variable id is just row number. It does not seem particularly useful for now... but wait for the twist.

model_poisson <- glmmTMB(reproductive_success ~ 1 + gonopodium_size + (1|tank_id), data = fish, family = poisson)
summary(model_poisson)
##  Family: poisson  ( log )
## Formula:          reproductive_success ~ 1 + gonopodium_size + (1 | tank_id)
## Data: fish
## 
##      AIC      BIC   logLik deviance df.resid 
##   3655.5   3671.3  -1824.8   3649.5     1417 
## 
## Random effects:
## 
## Conditional model:
##  Groups  Name        Variance Std.Dev.
##  tank_id (Intercept) 0.01442  0.1201  
## Number of obs: 1420, groups:  tank_id, 14
## 
## Conditional model:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -7.4969     0.3234  -23.18   <2e-16 ***
## gonopodium_size   0.6983     0.0299   23.35   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
testResiduals(model_poisson)

## $uniformity
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.086981, p-value = 9.324e-10
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## ratioObsSim = 1.9641, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test
## 
## data:  simulationOutput
## outliers at both margin(s) = 23, simulations = 1420, p-value = 0.002321
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.01029461 0.02420511
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01619718
## $uniformity
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.086981, p-value = 9.324e-10
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## ratioObsSim = 1.9641, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test
## 
## data:  simulationOutput
## outliers at both margin(s) = 23, simulations = 1420, p-value = 0.002321
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.01029461 0.02420511
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01619718
library(DiagrammeR)
grViz("
digraph boxes_and_circles {
node [shape = square, color = red]
repro; size

node [shape = circle, color = black]
lambda; y; noise; baselines;

node [shape = diamond, color=green]
intercept; 

y -> lambda [label = '  exp']; 
lambda -> repro [label = '  Poisson', style=dashed];
intercept -> baselines;
baselines ->y; size -> slope -> y ; noise -> baselines;
V_tank -> noise [label = '  Gaussian', style=dashed];
}")
model_nbinom1 <- glmmTMB(reproductive_success ~ 1 + gonopodium_size+ (1|tank_id), data = fish, family = nbinom1())
summary(model_nbinom1)
##  Family: nbinom1  ( log )
## Formula:          reproductive_success ~ 1 + gonopodium_size + (1 | tank_id)
## Data: fish
## 
##      AIC      BIC   logLik deviance df.resid 
##     3212     3233    -1602     3204     1416 
## 
## Random effects:
## 
## Conditional model:
##  Groups  Name        Variance  Std.Dev. 
##  tank_id (Intercept) 4.613e-09 6.792e-05
## Number of obs: 1420, groups:  tank_id, 14
## 
## Overdispersion parameter for nbinom1 family (): 1.25 
## 
## Conditional model:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -6.15651    0.43928  -14.02   <2e-16 ***
## gonopodium_size  0.57295    0.04126   13.89   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
testResiduals(model_nbinom1)

## $uniformity
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.021229, p-value = 0.5442
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## ratioObsSim = 1.3377, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test
## 
## data:  simulationOutput
## outliers at both margin(s) = 5, simulations = 1420, p-value = 0.07025
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001144258 0.008197856
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                            0.003521127
## $uniformity
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.021229, p-value = 0.5442
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## ratioObsSim = 1.3377, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test
## 
## data:  simulationOutput
## outliers at both margin(s) = 5, simulations = 1420, p-value = 0.07025
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001144258 0.008197856
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                            0.003521127
model_nbinom2 <- glmmTMB(reproductive_success ~ 1 + gonopodium_size+ (1|tank_id), data = fish, family = nbinom2())
summary(model_nbinom2)
##  Family: nbinom2  ( log )
## Formula:          reproductive_success ~ 1 + gonopodium_size + (1 | tank_id)
## Data: fish
## 
##      AIC      BIC   logLik deviance df.resid 
##   3174.8   3195.8  -1583.4   3166.8     1416 
## 
## Random effects:
## 
## Conditional model:
##  Groups  Name        Variance  Std.Dev. 
##  tank_id (Intercept) 2.021e-09 4.496e-05
## Number of obs: 1420, groups:  tank_id, 14
## 
## Overdispersion parameter for nbinom2 family (): 0.762 
## 
## Conditional model:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -7.33988    0.49339  -14.88   <2e-16 ***
## gonopodium_size  0.68396    0.04749   14.40   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
testResiduals(model_nbinom2)

## $uniformity
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.01608, p-value = 0.8563
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## ratioObsSim = 1.2224, p-value = 0.016
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test
## 
## data:  simulationOutput
## outliers at both margin(s) = 6, simulations = 1420, p-value = 0.1333
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001552160 0.009173968
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                            0.004225352
## $uniformity
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.01608, p-value = 0.8563
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## ratioObsSim = 1.2224, p-value = 0.016
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test
## 
## data:  simulationOutput
## outliers at both margin(s) = 6, simulations = 1420, p-value = 0.1333
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001552160 0.009173968
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                            0.004225352

Instead of using a different family, we can try to model over-dispersion with an observation-level random effect:

model_poisson_re <- glmmTMB(reproductive_success ~ 1 + gonopodium_size + (1|id)+ (1|tank_id), data = fish, family = poisson)
summary(model_poisson_re)
##  Family: poisson  ( log )
## Formula:          reproductive_success ~ 1 + gonopodium_size + (1 | id) + (1 |  
##     tank_id)
## Data: fish
## 
##      AIC      BIC   logLik deviance df.resid 
##   3133.5   3154.6  -1562.8   3125.5     1416 
## 
## Random effects:
## 
## Conditional model:
##  Groups  Name        Variance  Std.Dev. 
##  id      (Intercept) 1.134e+00 1.065e+00
##  tank_id (Intercept) 2.499e-10 1.581e-05
## Number of obs: 1420, groups:  id, 1420; tank_id, 14
## 
## Conditional model:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -7.81435    0.52769  -14.81   <2e-16 ***
## gonopodium_size  0.67597    0.05006   13.50   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
testResiduals(model_poisson_re)

## $uniformity
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.012294, p-value = 0.9827
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## ratioObsSim = 1.0481, p-value = 0.52
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test
## 
## data:  simulationOutput
## outliers at both margin(s) = 5, simulations = 1420, p-value = 0.07025
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001144258 0.008197856
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                            0.003521127
## $uniformity
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.012294, p-value = 0.9827
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## ratioObsSim = 1.0481, p-value = 0.52
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test
## 
## data:  simulationOutput
## outliers at both margin(s) = 5, simulations = 1420, p-value = 0.07025
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001144258 0.008197856
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                            0.003521127
library(DiagrammeR)
grViz("
digraph boxes_and_circles {
node [shape = square, color = red]
repro; size

node [shape = circle, color = black]
lambda; y; noise; baselines;

node [shape = diamond, color=green]
intercept; 

y -> lambda [label = '  exp']; 
lambda -> repro [label = '  Poisson', style=dashed];
intercept -> baselines;
baselines ->y; size -> slope -> y ; noise -> baselines;
V_tank -> noise [label = '  Gaussian', style=dashed];
V_id -> noise [label = '  Gaussian', style=dashed];
}")

This model says something like "each observation originated from the effect of the predictor, plus some unexplained noise, and that noise is additive on the log-scale, which means, unknown effects are multiplicative on the data scale". It does not always work, but it is often a reasonable model for biological data.